# Visual Text Generation
Wan2.1 T2V 1.3B GGUF
Apache-2.0
Direct GGUF conversion version of Wan2.1-T2V-1.3B, suitable for text-to-video generation tasks on consumer-grade GPUs
Text-to-Video English
W
samuelchristlie
155
0
Gemma 3 12b It Qat Autoawq
Gemma 3 is Google's lightweight open model series based on Gemini technology, supporting multimodal input and text output.
Image-to-Text
Safetensors
G
gaunernst
498
3
Documentcogito
Apache-2.0
A fine-tuned multimodal model based on unsloth/Llama-3.2-11B-Vision-Instruct, optimized for vision-language tasks and enhanced instruction-following capabilities, achieving 2x training acceleration through the Unsloth framework
Text-to-Image
Transformers English

D
Daemontatox
73
1
Erax VL 7B V1.5 GGUF
Apache-2.0
Quantized version of EraX-VL-7B-V1.5, supporting Vietnamese, English, and Chinese, suitable for tasks like insurance and OCR.
Image-to-Text Supports Multiple Languages
E
mradermacher
190
1
Donut Base Finetuned Zhtrainticket
MIT
Donut model fine-tuned on ZhTrainTicket for document image-to-text conversion without OCR processing.
Image-to-Text
Transformers

D
naver-clova-ix
362
0
Donut Base Finetuned Cord V2
MIT
Donut is an OCR-free document understanding Transformer model composed of a visual encoder (Swin Transformer) and a text decoder (BART), capable of directly extracting text information from images.
Image-to-Text
Transformers

D
naver-clova-ix
21.63k
97
Featured Recommended AI Models